Migrating HL compile and export to infer APIs #214

asmigosw · 2025-01-08T08:03:45Z

Migrating HL compile API and export API to infer APIs

quic-amitraj · 2025-01-15T09:38:47Z

Please rebase

Change-Id: If27fbc1636ed1fe9b475d07cef7c83ed7dc46ca8 Signed-off-by: Asmita Goswami <[email protected]>

quic-rishinr · 2025-01-16T09:27:02Z

QEfficient/cloud/export.py

-        )  # type: ignore
-        logger.info(f"Generated onnx_path: {onnx_model_path}, onnx_dir_path: {onnx_dir_path}")
+    logger.info(f"Exporting Pytorch {model_name} model to ONNX...")
+    qeff_model = QEFFAutoModelForCausalLM.from_pretrained(model_name, cache_dir)


We should not restrict CLI apis to use only AutoModelForCausalLM. It should be generic as we support new auto classes.

Change-Id: If27fbc1636ed1fe9b475d07cef7c83ed7dc46ca8 Signed-off-by: Asmita Goswami <[email protected]>

…transformers into hl_compile_api_infer

Signed-off-by: Asmita Goswami <[email protected]>

ochougul · 2025-01-29T03:43:41Z

QEfficient/cloud/export.py

    hf_token: Optional[str] = None,
-    local_model_dir: Optional[str] = None,


Why are you removing this?

ochougul · 2025-01-29T03:44:04Z

QEfficient/cloud/export.py

@@ -92,7 +76,6 @@ def main(
        model_name=model_name,
        cache_dir=cache_dir,
        hf_token=hf_token,
-        local_model_dir=local_model_dir,


ochougul · 2025-01-29T03:57:40Z

QEfficient/cloud/infer.py

+    config = AutoConfig.from_pretrained(model_name)
+    architecture = config.architectures[0] if config.architectures else None
+
+    model_class = architecture_mapping.get(architecture)
+    if not model_class:
+        logger.error(f"Model class for model name {model_name} not found in mapping")
+        return
+
+    qeff_model = model_class.from_pretrained(model_name)


why? directly useQEFFAutoModelForCausalLM.from_pretrained

Instead of writing own dictionary please make use of
MODEL_FOR_IMAGE_TEXT_TO_TEXT_MAPPING_NAMES , MODEL_FOR_CAUSAL_LM_MAPPING_NAMES from transformers.

from transformers.models.auto.modeling_auto import MODEL_FOR_IMAGE_TEXT_TO_TEXT_MAPPING_NAMES, MODEL_FOR_CAUSAL_LM_MAPPING_NAMES ## Now check if the architecture is present in either of the values of these two dictionaries and call our corresponding auto class based on that.

ochougul · 2025-01-29T03:57:58Z

QEfficient/cloud/infer.py

+# Map model's architecture to class
+architecture_mapping = {
+    "LlamaForCausalLM": QEFFAutoModelForCausalLM,
+    "GPT2LMHeadModel": QEFFAutoModelForCausalLM,
+    "MistralForCausalLM": QEFFAutoModelForCausalLM,
+    "FalconForCausalLM": QEFFAutoModelForCausalLM,
+    "GPTJForCausalLM": QEFFAutoModelForCausalLM,
+    "GemmaForCausalLM": QEFFAutoModelForCausalLM,
+    "Gemma2ForCausalLM": QEFFAutoModelForCausalLM,
+    "Phi3ForCausalLM": QEFFAutoModelForCausalLM,
+    "Qwen2ForCausalLM": QEFFAutoModelForCausalLM,
+    "GPTBigCodeForCausalLM": QEFFAutoModelForCausalLM,
+}
+


Signed-off-by: Asmita Goswami <[email protected]>

ochougul · 2025-01-29T10:23:42Z

QEfficient/cloud/export.py


-from QEfficient.exporter.export_hf_to_cloud_ai_100 import qualcomm_efficient_converter
-from QEfficient.utils import check_and_assign_cache_dir, onnx_exists
+from QEfficient.transformers.models.modeling_auto import QEFFAutoModelForCausalLM


from QEfficient import QEFFAutoModelForCausalLM it's present in __init__

ochougul · 2025-01-29T10:25:08Z

QEfficient/cloud/export.py

-            full_batch_size=full_batch_size,
-        )  # type: ignore
-        logger.info(f"Generated onnx_path: {onnx_model_path}, onnx_dir_path: {onnx_dir_path}")
+        logger.error(f"Model class for model name {model_name} not found in mapping")


`raise NotImplementedError(f"Unknown architecture={architecture}, either use specific auto model class for loading the model or raise an issue for support!")

We should fail here which will force the script to exit

ochougul · 2025-01-29T10:25:35Z

QEfficient/cloud/export.py

-        )  # type: ignore
-        logger.info(f"Generated onnx_path: {onnx_model_path}, onnx_dir_path: {onnx_dir_path}")
+        logger.error(f"Model class for model name {model_name} not found in mapping")
+        return


ochougul · 2025-01-29T10:27:25Z

QEfficient/cloud/infer.py

-            enable_qnn=enable_qnn,
-            qnn_config=qnn_config,
-        )
+        logger.error(f"Model class for model name {model_name} not found in mapping")


same raise error

ochougul · 2025-01-29T10:28:14Z

QEfficient/cloud/infer.py

+    config = AutoConfig.from_pretrained(model_name)
+    architecture = config.architectures[0] if config.architectures else None
+
+    if architecture in MODEL_FOR_CAUSAL_LM_MAPPING_NAMES.values():
+        model_class = QEFFAutoModelForCausalLM
    else:
-        # Handle onnx model generation
-        onnx_model_path = get_onnx_model_path(
-            model_name, cache_dir, tokenizer, hf_token, local_model_dir, full_batch_size
-        )  # , base_dir_name)
-
-        #########
-        # Compile
-        #########
-        _ = QEfficient.compile(
-            onnx_path=onnx_model_path,
-            qpc_path=os.path.dirname(
-                qpc_dir_path
-            ),  # We need to pass parent directory of qpc_dir_path, as the compile function handles the qpcs directory creation
-            num_cores=num_cores,
-            batch_size=batch_size,
-            prompt_len=prompt_len,
-            ctx_len=ctx_len,
-            mxfp6=mxfp6,
-            mxint8=mxint8,
-            aic_enable_depth_first=aic_enable_depth_first,
-            mos=mos,
-            device_group=device_group,
-            full_batch_size=full_batch_size,
-            allow_mxint8_mdp_io=allow_mxint8_mdp_io,
-            enable_qnn=enable_qnn,
-            qnn_config=qnn_config,
-        )
+        logger.error(f"Model class for model name {model_name} not found in mapping")
+        return
+
+    qeff_model = model_class.from_pretrained(
+        pretrained_model_name_or_path=(local_model_dir if local_model_dir else model_name),
+        cache_dir=cache_dir,
+        hf_token=hf_token,
+        full_batch_size=full_batch_size,
+    )


Since this code is a copy of the same in export method you can create a common method in utils file in cloud folder and use from there. you can call it.
load_qeff_model

Signed-off-by: Asmita Goswami <[email protected]>

Signed-off-by: Onkar Chougule <[email protected]>

asmigosw requested a review from quic-rishinr as a code owner January 8, 2025 08:03

quic-rishinr force-pushed the main branch from 5fb13b5 to 40751a2 Compare January 10, 2025 07:08

quic-rishinr requested a review from ochougul as a code owner January 10, 2025 07:08

quic-rishinr mentioned this pull request Jan 15, 2025

Migrating HL compile and export to infer APIs #200

Closed

Migrating HL compile and export to infer APIs

12da558

Change-Id: If27fbc1636ed1fe9b475d07cef7c83ed7dc46ca8 Signed-off-by: Asmita Goswami <[email protected]>

asmigosw force-pushed the hl_compile_api_infer branch from acf8ca3 to 12da558 Compare January 16, 2025 05:59

quic-rishinr requested a review from quic-amitraj January 16, 2025 06:12

quic-rishinr reviewed Jan 16, 2025

View reviewed changes

quic-amitraj marked this pull request as draft January 16, 2025 10:36

asmigosw added 4 commits January 28, 2025 06:57

Migrating HL compile and export to infer APIs

d3a97ca

Change-Id: If27fbc1636ed1fe9b475d07cef7c83ed7dc46ca8 Signed-off-by: Asmita Goswami <[email protected]>

Merge branch 'quic:main' into hl_compile_api_infer

39bbe5a

Merge branch 'hl_compile_api_infer' of github.com:asmigosw/efficient-…

286e767

…transformers into hl_compile_api_infer

Made modelling class generic in infer

9ed96be

Signed-off-by: Asmita Goswami <[email protected]>

ochougul marked this pull request as ready for review January 29, 2025 03:42

ochougul requested changes Jan 29, 2025

View reviewed changes

Made modelling class generic

55b753c

Signed-off-by: Asmita Goswami <[email protected]>

ochougul requested changes Jan 29, 2025

View reviewed changes

asmigosw and others added 3 commits January 29, 2025 11:20

Added load qeff model

292fe73

Signed-off-by: Asmita Goswami <[email protected]>

clean code

26ca6a7

Signed-off-by: Onkar Chougule <[email protected]>

removed extra code as part of deprecation

6ade0fa

Signed-off-by: Onkar Chougule <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrating HL compile and export to infer APIs #214

Migrating HL compile and export to infer APIs #214

asmigosw commented Jan 8, 2025

quic-amitraj commented Jan 15, 2025

quic-rishinr Jan 16, 2025

ochougul Jan 29, 2025

ochougul Jan 29, 2025

ochougul Jan 29, 2025

ochougul Jan 29, 2025

ochougul Jan 29, 2025

ochougul Jan 29, 2025

ochougul Jan 29, 2025

ochougul Jan 29, 2025

ochougul Jan 29, 2025

ochougul Jan 29, 2025

ochougul Jan 29, 2025

		hf_token: Optional[str] = None,
		local_model_dir: Optional[str] = None,

Migrating HL compile and export to infer APIs #214

Are you sure you want to change the base?

Migrating HL compile and export to infer APIs #214

Conversation

asmigosw commented Jan 8, 2025

quic-amitraj commented Jan 15, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment